Two different versions of the count matrix exist: the one from GEO and the one created by us personally from cell ranger
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 3235143 172.8 5279980 282.0 NA 5279980 282.0
## Vcells 64592090 492.8 142652204 1088.4 16384 129874601 990.9
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Transformation introduced infinite values in continuous x-axis
## used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 3458735 184.8 5279980 282 NA 5279980 282.0
## Vcells 65798104 502.0 801505012 6115 16384 990044566 7553.5
Decided to go with GEO matrix for analysis (probably no big difference in the end)
## Warning: Feature names cannot have underscores ('_'), replacing with dashes
## ('-')
## An object of class Seurat
## 15619 features across 3399 samples within 1 assay
## Active assay: RNA (15619 features, 0 variable features)
## An object of class Seurat
## 15619 features across 2260 samples within 1 assay
## Active assay: RNA (15619 features, 0 variable features)
## When using repel, set xnudge and ynudge to 0 for optimal results
## Warning: Transformation introduced infinite values in continuous x-axis
## Transformation introduced infinite values in continuous x-axis
## Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
## Centering and scaling data matrix
## PC_ 1
## Positive: TXN, LDHB, GAPDH, YBX1, TPI1, MIF, SLC25A5, RPL35, NME1, ENO1
## NDUFS6, UQCRQ, RPL36, PRDX2, SNRPE, PRELID1, SSBP1, RAN, LSM4, APRT
## PHB, PSMA7, RPL7A, NPM1, PGAM1, LDHA, SHFM1, NHP2, SLIRP, RANBP1
## Negative: RPS4Y1, HLA-DPA1, HLA-DPB1, HLA-DQA2, HLA-DRA, HLA-DRB1, HLA-DRB5, HLA-DQA1, CD69, EVL
## HLA-DQB1, PMAIP1, CD79A, ITGB2, REL, BANK1, CD83, PTPRC, BIN1, IRF8
## MYO1F, CD99, FNBP1, PHACTR1, MYO1G, HCST, NKG7, INPP5D, CELF2, CLEC2B
## PC_ 2
## Positive: IGLC3, MAFB, IGLV2-14, SSR4, IGHG4, DEPTOR, IGHG3, LGALS1, S100A4, IGHG1
## GULP1, HGF, PEBP1, SQLE, CBLN2, MT-CYB, TYMP, PIM2, EMP3, HSPA5
## PBXIP1, RPLP0, CD44, CAV1, IGHA2, AC074289.1, METTL7A, TSC22D3, NTRK2, EVI2A
## Negative: RPS4Y1, ITGB2, CD99, STMN1, HCST, EVL, KIAA0101, NKG7, TMSB4X, PTPRC
## GNLY, MYBL2, TYMS, CST7, ZWINT, CCL5, HLA-DPA1, GZMA, RRM2, MYO1F
## ASF1B, MKI67, TK1, HMGB2, CTSW, HMGN2, DHFR, HOPX, HLA-DQA2, CD7
## PC_ 3
## Positive: NKG7, GNLY, HCST, ITGB2, GZMA, CCL5, CST7, CTSW, CD7, PRF1
## KLRB1, HOPX, CD247, GZMB, EVL, LCK, IFITM2, ID2, KLRF1, MYO1F
## TRDC, TMSB4X, MATK, TYROBP, KLRD1, GZMM, CD99, GZMH, SPON2, FCER1G
## Negative: STMN1, RRM2, MKI67, KIAA0101, CDCA5, MYBL2, ZWINT, DHFR, ASF1B, TK1
## AURKB, SPC25, CDK1, SKA1, CLSPN, KIFC1, TYMS, TOP2A, PBK, HMGB2
## WDR34, BIRC5, CENPN, PHF19, CCNA2, HIST1H4C, SKA3, SHCBP1, CENPU, H2AFZ
## PC_ 4
## Positive: NKG7, GNLY, CCL5, GZMA, CTSW, CST7, PRF1, CD7, KLRB1, CD247
## GZMB, TRDC, GZMH, KLRF1, KLRD1, MATK, SPON2, LCK, GZMM, TRBC1
## IL32, ZAP70, CLIC3, FGFBP2, HCST, KLRC1, AOAH, ADGRG1, ID2, PRKCH
## Negative: CD74, HLA-DRA, HLA-DPA1, HLA-DRB1, HLA-DPB1, HLA-DQA2, CD79A, HLA-DRB5, HLA-DQA1, HLA-DMA
## HLA-DQB1, CD37, CD52, HLA-DMB, LAPTM5, BANK1, CD22, BLK, CD83, IRF8
## RPS4Y1, HHEX, IGHD, CD19, TNFRSF13C, PHACTR1, LRMP, ARID5B, LTB, SWAP70
## PC_ 5
## Positive: LGALS1, S100A6, S100A4, S100A11, ZBP1, MT-CO1, PBXIP1, KLF6, EMP3, ISG20
## ADGRE5, DEPTOR, NEAT1, OAS1, ISG15, POU2F2, IFI16, JUN, DUSP1, CRIP1
## RP11-81H14.2, S100A10, SAMD9L, EVI2A, LMNA, IL16, HIST1H1C, AHNAK, MX1, IRF7
## Negative: HMGA1, HSPE1, NPM1, ECE2, FKBP4, CCND2, EBNA1BP2, TOMM40, EXOSC4, CACYBP
## FABP5, ATAD3B, LDHA, LSM7, HSPD1, PA2G4, NME1, PHB, WDR4, CISD3
## MCM7, SCARB1, NOP16, CCT2, PAICS, SNRPA1, POLR2H, NOP58, TIMM13, MRPL4
## Find clusters
## Computing nearest neighbor graph
## Computing SNN
## Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
##
## Number of nodes: 2260
## Number of edges: 71089
##
## Running Louvain algorithm...
## Maximum modularity in 10 random starts: 0.8306
## Number of communities: 7
## Elapsed time: 0 seconds
## Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
## To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
## This message will be shown once per session
## 14:30:23 UMAP embedding parameters a = 0.9922 b = 1.112
## 14:30:23 Read 2260 rows and found 10 numeric columns
## 14:30:23 Using Annoy for neighbor search, n_neighbors = 30
## 14:30:23 Building Annoy index with metric = cosine, n_trees = 50
## 0% 10 20 30 40 50 60 70 80 90 100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 14:30:23 Writing NN index file to temp file /var/folders/p4/zn4sp81s5gq35dzmr0nknsjc0000gn/T//Rtmps8fWRq/file21fe7524447
## 14:30:23 Searching Annoy index using 1 thread, search_k = 3000
## 14:30:23 Annoy recall = 100%
## 14:30:23 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
## 14:30:24 Initializing from normalized Laplacian + noise (using irlba)
## 14:30:24 Commencing optimization for 500 epochs, with 92016 positive edges
## 14:30:27 Optimization finished
Specific MM marker genes: B2M, CCND1, Immunoglobulins (IGHM, IGKC)
Source: https://www.cancer.gov/about-cancer/diagnosis-staging/diagnosis/tumor-markers-list
Additional marker used in the Numbat paper for MM: MZB1
## p_val avg_log2FC pct.1 pct.2 p_val_adj
## RPS4Y1 0.000000e+00 10.56039 0.058 0.953 0.000000e+00
## HLA-DRB1 8.572363e-274 -85.19576 0.042 0.809 1.338917e-269
## HLA-DPA1 1.149036e-271 -82.28723 0.054 0.844 1.794679e-267
## HLA-DQA2 3.944242e-266 -10.89713 0.006 0.619 6.160511e-262
## HLA-DPB1 3.530689e-252 -137.06263 0.069 0.848 5.514583e-248
## HLA-DQA1 8.267348e-246 -22.22866 0.013 0.619 1.291277e-241
## [1] 3516
##
## aneuploid diploid not.defined
## control 13 208 36
## MM 1992 4 7
##
## filtered normal tumor
## control 21 223 13
## MM 6 4 1993
##
## filtered normal tumor
## aneuploid 0 0 2005
## diploid 0 212 0
## not.defined 27 15 1